DNA barcoding of reef‐associated fishes of Saint Martin's Island, Northern Bay of Bengal, Bangladesh

Abstract This study employs the DNA barcoding approach to make a molecular taxonomic catalog of reef fishes of Saint Martin's Island (SMI), an ecologically critical area (ECA), and Marine Protected Area (MPA) in Bangladesh. DNA barcoding, along with morphological analysis, confirmed 84 reef‐associated fish species in SMI belonging to 16 orders, 39 families, and 67 genera. A total of 184 sequences were obtained in this study where 151 sequences (534–604 bp) of 81 species were identified from the COI barcode gene and 33 sequences (609 bp) of 19 species from the 16S rRNA gene region which were submitted to the GenBank and Barcode of Life Data System (BOLD). Among these sequences, 70 sequences of the COI gene and 16 sequences of 16S rRNA gene region from 41 species were submitted for the first time into the GenBank from Bangladesh. For molecular characterization analysis, another 37 sequences of 15 reef fish species of SMI were added from previous studies, making a total of 221 DNA sequences which comprised 179 sequences of 96 species for the COI gene and 42 sequences of 26 species for the 16S rRNA gene region. The COI sequences contain 145 haplotypes with 337 polymorphic sites, and the mean genetic distances within species, genera, and families were calculated as 0.34%, 12.26%, and 19.03%, respectively. On the contrary, 16S rRNA sequences comprised 31 haplotypes with 241 polymorphic sites, and the mean genetic divergences within species, genera, and families were 0.94%, 4.72%, and 12.43%, respectively. This study is a significant contribution to the marine biodiversity of Bangladesh which would facilitate the assessment of species diversity for strategizing management action. It is also an important input to the DNA barcode library of reef fishes of the northern Bay of Bengal.


| INTRODUC TI ON
The Bay of Bengal covers 2,172,000 sq.km in the northeastern Indian Ocean, representing about 12% of the world's coral reefs (BOBLME, 2011).Heavy sediment discharge of the Ganges-Brahmaputra-Meghna River system, representing about 6% of the world's total sediment input into the oceans by rivers along with a lack of hard substrate limit the development of viable coral communities and coral reefs in the north and northeast Bay of Bengal (Rajasuriya, 2002;Sheppard, 2000;Spalding et al., 2001).In these relatively turbid coastal waters of the northeastern Bay of Bengal, about 9 km south of the mouth of the Naf River, there is a dumbbell-shaped small rocky island namely Saint Martin's Island (SMI).
Including the rocky platforms extending into the sea, the total area of the island is about 12 sq.km.The island is located on a shallow continental shelf with a maximum depth of 25 m.Its shallow-water marine habitats comprise rocky and sandy intertidal, intertidal rock pools, offshore lagoons, rocky and sandy subtidal, and offshore soft-bottom habitats.Shoreline habitats are sandy beaches and dunes, scattered rocks, and coral boulders, which are also found on the interior of the island (Alam & Hassan, 1997;Tomascik, 1997).
Traditionally, fishes are identified based on morphological features.However, due to high diversity, dramatic phenotypic changes during development, variability in their morphological colouration, sexual dimorphism, or ontogenetic development in many cases, reef fish species are sometimes difficult to identify by using morphological characteristics alone (Duarte et al., 2017;Hubert et al., 2010;Leis & Carson-Ewart, 2004;Victor et al., 2009).DNA barcoding technique, which involves sequencing approximately 650 base pairs of the mitochondrial gene cytochrome oxidase subunit I (COI), has recently emerged to support species identifications for different taxonomic groups and uncover biological diversity and also proved as a reliable tool for species conservation (Floyd et al., 2002;Hebert et al., 2003;Tautz et al., 2003;Ward et al., 2005).It is an effective tool to detect all life stages including eggs, larvae, juveniles (Hubert et al., 2008(Hubert et al., , 2010(Hubert et al., , 2015)), sexually dimorphic species or those with large phenotypic plasticity and cryptic species (Sekino & Yamashita, 2013;Winterbottom et al., 2014) that are widely distributed in marine systems, especially in coral reef-associated organisms (Hubert et al., 2012).This tool is also useful for detecting those species that are often misidentified or difficult to detect using traditional taxonomic methods (Becker et al., 2015;Burghart et al., 2014;Knowlton et al., 1993;Knowlton, 2000;Ko et al., 2013;Lee & Kim, 2014;Lin et al., 2016).This advanced molecular marker is also capable of providing additional information to identify unique and new species from marine ecosystems and reveals undisclosed biodiversity than previously estimated (Brasier, 2017;Habib, Neogi, Islam, & Nahar, 2019;Jaafar et al., 2012).Thus, the DNA barcoding method now represents the largest effort to catalog biodiversity using molecular approaches, especially for a diverse group of individuals.
In recent years, DNA barcoding has been frequently used to assess the coral-associated fish diversity in different locations of the Indo-Pacific region such as Weh Island (Fadli et al., 2020) and Ambon Harbor (Limmon et al., 2020) of Indonesia, Mischief Reef of Nansha Islands (Shan et al., 2021).In Bangladesh, some DNA barcoding studies of fishes of both marine and freshwater habitats have been accomplished in the last few years such as Ahmed et al. (2019), Rahman et al. (2019), Ahmed et al. (2021), andHabib, Neogi, Rahman, Oh, et al. (2021).However, there is a lack of specific studies focusing exclusively on the DNA barcoding of reef-associated fish species in Bangladesh.Considering the importance of ECA and MPA of Bangladesh, as well as the northern Bay of Bengal, the present study aims to assess the diversity and make an updated inventory of reef-associated fishes of SMI through DNA barcoding, and to build a reference library of DNA barcode data for reef-associated fishes of Bangladesh.This kind of molecular study particularly focusing on reef fishes has rarely been conducted not only in Bangladesh but also in the entire Bay of Bengal region.

| Collection of samples
Specimens of fish were collected at landing from local fishermen or traders of SMI between May 2017 and July 2019 (Figure 1).As per the provided information by local fishermen, they were fished using hook and line and gill net set on or around the submerged rock surrounding the island.After tagging, the collected samples were photographed in the field for the best living colour representation.Then it was transferred and stored in the Aquatic Bioresource Research Lab. (ABR Lab.),Department of Fisheries Biology and Genetics, Sher-e-Bangla Agricultural University (SAU), Dhaka, Bangladesh for morphological and molecular analysis.The morphological diagnosis (meristic counts and proportional measurements) of collected specimens was performed according to Carpenter and Niem (1999a, 1999b, 2001a, 2001b), Allen et al. (2003), Rahman et al. (2009); Allen and Erdmann (2012), Psomadakis et al. (2019); Froese and Pauly (2023).We followed Frick et al. (2023) for the recent valid name of the genus, species, family, and orders.After species identification by morphological study, a small piece of muscle tissue from the fish specimens was cut and stored in a sterile 1.5 mL tube containing 98% alcohol for subsequent molecular work.
The 16S rRNA sequences were amplified using the primer set 16Sar AAC TCA GAT CACGT-3′) (Palumbi, 1996).The PCR profile consisted of a preheating at 95°C for 2 min followed by 35 cycles of denaturation at 95°C for 1 min, annealing at 54°C for the COI region or at 52°C for the 16S rRNA gene for 40 s, extension at 72°C for 1 min, and completion with a final extension at 72°C for 10 min.After successful PCR, every sample was visualized on 1% agarose gel (EZ-Vision® In-Gel Solution, USA) stained with ethidium bromide in the gel documentation chamber (Model: Syngene InGenius 3 ).The flow UV-ray is kept on to watch the band in the connected computer by using GeneSys software.PCR samples with a single and clear visible band were purified with the PCR Purification Kit (TIANGEN-Universal DNA Purification Kit) for sequencing.The concentration of the purified DNA was estimated with the help of a Qubit 3.0 fluorometer.Sequencing was conducted with the same PCR primers by the Sanger method with an automated sequencer (ABI 3730 × 1 DNA analyzer) at Macrogen Inc. (Korea).

| Data analysis
The obtained consensus sequences were edited based on the chromatogram peak clarities with the help of Chromas Lit and Geneious 9.0.5 program combined with manual proofreading.Stop codons were checked for COI sequences by Expasy translate tools (Duvaud et al., 2021).The sequences were aligned using ClustalW in MEGA 7.0 software and then matched using the BLAST search engine provided by NCBI and the Bold database.The consensus sequences obtained from all specimens through DNA sequencing of both COI and 16S rRNA gene regions were submitted to the BOLD system (project code: SAU) and NCBI GenBank (accession numbers given in Table 1) which are accessible to all researchers.In the data analysis, we also added 37 sequences (28 sequences of COI and 9 sequences of 16 s rRNA gene region) of 15 coral-associated fishes of SMI previously reported in the GenBank from different studies conducted in ABR Lab.(Reference given in the "source of sequences" column of Table 1).
Pairwise genetic distances at different taxonomic levels (within species, within genera, and within families) and Barcoding Gap Analysis were calculated by the Kimura-2-parameter (K2P) model and Kalign multiple species alignment (Lassmann & Sonnhammer, 2005) using Sequence Analysis Engine of BOLD (http:// www.bolds ystems.org/ ).Phylogenetic analysis was performed using maximum likelihood (ML) methods through IQ-TREE v1.6.12 (Nguyen et al., 2015;Trifinopoulos et al., 2016).The robustness of the phylogenetic relationships was evaluated by bootstrap analysis with 100,000 replications (Felsenstein, 1985).We used the evolutionary model GTR + F + I + G4 as the best-fit model, which was selected by Model Finder (Kalyaanamoorthy et al., 2017) applying the Bayesian information criterion.The Kimura-2 parameter (K2P) distance model (Kimura, 1980) was used for calculating the genetic distance among the sequences using MEGA-7.The ML tree was visualized using FigTree v1.4.3 (Rambaut & Drummond, 2016) and edited by Adobe Illustrator.Sequence composition and GC% in different codon positions of COI barcode region and overall GC% of 16S rRNA sequences were measured by the BOLD system analyzer version 3.
The nucleotide diversity, number of polymorphic sites, and haplotype diversity were obtained using the program ARLEQUIN (version 3.5; Schneider et al., 2000).

| RE SULTS
Morphological and molecular analyses confirmed a total of 84 reefassociated fish species belonging to 16 orders, 39 families, and 67 genera in the present study.Among the identified species, six spe-  (Table 1).A total of 179 COI barcode sequences of reef fishes of SMI were used for molecular characterization and phylogenetic analyses where 151 sequences of 81 species were obtained from the present study and 28 sequences of 15 species were collected from previous studies (Table 1).After editing and aligning all of these COI sequences the length of the consensus sequences was 534-604 bp.
In the phylogenetic tree, COI barcode sequences discriminated all the species and clustered the similar species with significant bootstrap values of 80%-100% under the same nodes (Figure 2).The assessment of species identities with previously known sequences and closely related species in GenBank databases generated 98%-100% identities indicating the effectiveness of COI sequences in providing species-level resolution.In addition, Barcoding Gap Analysis showed that no species lacked a barcode gap (intraspecific K2P distance ≥ interspecific), no species with high intraspecific distance (>2%), and no species with low distance to other species (≤2%) which indicates that all of the studied species identified by the DNA barcode approach.
The 179 COI sequences of 96 species comprised 145 haplotypes with 337 polymorphic sites.A total of 82 indels were found.
The nucleotide diversity was calculated as 0.19 ± 0.01 (mean ± SD) and the haplotype diversity was 0.99 ± 0.00 (mean ± SD) for the    2. The overall mean nucleotide base frequencies observed for 179 COI sequences were 23.73 ± 0.09% (mean ± SD), 28.91 ± 0.12% (mean ± SD), 28.88 ± 0.14% (mean ± SD) and 18.48 ± 0.07% (mean ± SD) for adenine (A), thymine (T), cytosine (C) and guanine (G), respectively.The base composition analysis for the COI sequence showed that the average T content was the highest and the average G content was the lowest; the mean GC content was 47.36%.The GC contents at the first, second, and third codon positions for the 179 sequences of 96 reef-associated fishes were 56.94%, 43.03%, and 42.08%, respectively.The distribution of GC composition by all of the 3 codon positions is given in Figure 4.
The overall mean distance of the COI sequences was 23.50 ± 0.01% (mean ± SD).A summary of genetic distances of different taxonomic levels viz., within species, genera, and families based on the Kimura two-parameter (K2P) distance model is given in Table 3. Minimum genetic distances within species are 0.00% and the maximum is 1.49%; the minimum genetic distance within the genus is 6.05% and the maximum is 18.77%.The minimum genetic distance within the family is 7.37% and the maximum is 25.46%.
Sequence divergence of 179 COI barcode sequences compared at the species and genus levels are shown in Figure 5.

MK560529
This study a Denotes "sedis mutabilis" (Fricke et al., 2023), that is, uncertain Order level status of the Families mentioned within (as per recent phylogenetic studies), but herein tentatively placed under Perciformes.
b First time submitted to GenBank from Bangladesh.
c First-ever contribution to GenBank.rRNA sequences compared at the species and genus levels is shown in Figure 7. Barcoding Gap Analysis showed that 1 species lacks barcode gap (intraspecific ≥ interspecific), 1 species with high intraspecific distance (>2%), and no species with a low distance to another species (≤2%), which indicates that most of the species of the studied species identified by the DNA barcode approach.The mean distance to the nearest neighbor (NN) was 9.23 ± 0.14% (mean ± SD; Figure 8).
The estimated transition/transversion average ratio (R) is 1.51.
Substitution patterns and rates were estimated using the Kimura two-parameter (K2P) model, and rates of different transitional substitutions are given in bold fonts and those of transversional substitutions are given in italics fonts in Table 5.

| DISCUSS ION
The present study represents the first molecular survey of the reefassociated fish fauna of Bangladesh.This study has demonstrated the uses of DNA barcoding to complement the morphological identification of 84 reef fish species from SMI.These DNA barcodes of reef fishes will significantly contribute to making the DNA barcode reference library of marine fishes of Bangladesh and broadly to the global DNA barcode entries.This baseline database is significant for future fisheries management and biodiversity conservation strategy of this MPA.
Through its rapid development over the one-and-a-half-decade, DNA barcoding has represented a well-established molecular tool in taxonomic research (Gong et al., 2018).Differences in evolutionary rates provide various DNA barcoding options but make it difficult to find a universal DNA barcode for all species (Gong et al., 2018).
Currently, the mitochondrial genes coding COI and 16S rRNA are considered reliable DNA barcodes for the identification of marine species (Habib, Neogi, Rahman, Oh, et al., 2021).DNA barcoding application by mtDNA COI and 16S rRNA gene sequencing together with morphological analysis in some recent studies also revealed several previously unrecognized reef-associated fish species as new records in SMI of Bangladeshi marine water.our study as also found in other studies (Alcantara & Yambot, 2016;Bhattacharjee et al., 2012;Cerutti-Pereyra et al., 2012;Filonzi et al., 2010;Joly et al., 2014;Kress & Erickson, 2012;Ratnasingham & Hebert, 2007;Ward, 2009;Zhang & Hanner, 2012).The ML tree showed that all identified species formed separate branches without any overlap between species which further indicates that our barcode database is suitable for discriminating reef fishes of SMI in Bangladesh.
These genetic distances within species of less than 2% are in agreement with the species delimitation threshold as proposed by Ward (2009) which further supports the branch of each identified species in this study.
The transition frequencies are relatively more than the transversion frequencies in mitochondrial genes as similarly found in the present study (392 vs. 167 for the COI gene and 303 vs. 164 for 16S rRNA barcode region) and also in other studies by Gojobori et al. (1982), Curtis and Clegg (1984), Wakeley (1994Wakeley ( , 1996)).It was known that a larger number of transversion pairs than transitions are related to deep divergence and often with sequence saturation (Yang & Yoder, 1999).The mean intraspecific K2P distance of 0.34% for the COI barcode gene region of reef fishes of SMI is higher than that of fish studies from other geographic areas such as 0.10% in South Africa (Cawthorn et al., 2011), 0.312% in Brazil (Ribeiro et al., 2012), 0.32% in turkey (Keskİn & Atar, 2013), 0.28% in Pakistan (Karim et al., 2016), and 0.21% in Taiwan Strait (Bingpeng et al., 2018).On the other hand, opposite findings, that is, the higher K2P distance was also found in some studies such as 0.57% in the fishes of Java and Bali (Dahruddin et al., 2017), and 0.37% in Pampa Plain, Argentina (Rosso et al., 2012).
The fish species collected from SMI and barcoded in this study were found in different categories of global conservation status according to the IUCN Red List of Threatened Species (IUCN, 2020).Among 84 identified species, sixty-six species (79%) were   categorized as Least Concern (LC), three species (4%) were Data Deficient (DD), thirteen species (15%) were categorized as Not Evaluated (NE) while two species (2%) were considered to be under Vulnerable (VU) category.The majority (LC) do not seem to require any additional protection as required for Critically Endangered, Endangered, Vulnerable, or Near Threatened categories (IUCN, 2020).However, ignoring the management of the LC category is also "unsafe" as they make up the majority portion (79%) of the fish.Though the species listed in NE and DD categories have no or limited biological, ecological, or distributional information, it would be sensible to confer this group careful attention, at least until their status is evaluated.
The SMI presents a variety of physiographic features such as rocky platforms, sandy beaches, sand dunes, lagoons, marshes, tombolo, crenulated shorelines, and coral clusters (Hoque et al., 1979 Hossain et al., 2007).Several anthropogenic threats were seen during the present survey such as internecine human intervention in coral reef destruction via indiscriminate anchoring of boats, fishing on coral reef habitats, throwing garbage into the water, and so on.Government and policymakers should come forward to save the marine biodiversity including reef-associated fishes of this natural treasure of Bangladesh using essential recommendations from different stakeholders.It is also needed immediately to formulate a sustainable strategic plan to manage this lonely coral island to protect its internal biodiversity and the livelihood of the local people.
DNA barcode inventory obtained from this study will contribute to making effective monitoring, conservation, and management strategies of fisheries resources of this only coral island of Bangladesh as done in different regions of the world (Ardura et al., 2010;Lewis et al., 2016;Thomsen et al., 2012;Valdez-Moreno et al., 2012;Weigt et al., 2012).

ACK N OWLED G M ENTS
We are thankful to PIU-BARC of NATP-2, for their cooperation during the study period.We also acknowledge the DRMREEF project of IOC/WESTPAC for technical support.We are also thankful to all of our lab members and other anonymous persons who were supportive of this study.
cies, for example, Canthidermis maculata (Bloch, 1786), Epinephelus fuscoguttatus (Forsskål 1775), Plectroglyphidodon apicalis (De Vis, 1885), Synodus variegatus (Lacepède, 1803), Opistognathus variabilis Smith-Vaniz, 2009, and Opistognathus rosenbergii Bleeker, 1856 are new distributional records in Bangladesh.A total of 184 sequences (COI and 16S rRNA) were obtained in the study where 151 sequences of 81 species were attained from the COI gene and 33 sequences of 19 species from the 16S rRNA gene region.Among 81 fish species, 16 species were common from where both COI and 16S rRNA gene regions were sequenced.Among the submitted sequences, 86 sequences (70 sequences from the COI gene and 16 sequences from the 16S rRNA gene region) of 41 species were submitted for the first time into the GenBank from Bangladesh COI sequences.Parsimony informative sites of two, three, and four variants were 103, 25, and 104.The number of transitions and transversion of studied COI sequences were 392 and 167, respectively.The estimated Transition/Transversion bias (R) was TA B L E 1 GenBank accession number of mitochondrial COI and 16S rRNA sequences used in the present study.
. Substitution patterns and rates were estimated using the Kimura 2-parameter model.Rates of different transitional substitutions are exposed in bold and the transversional substitutions are exposed in italic in Table Sequence alignment of 16S rRNA gene regions of the present study after trimming of primer ends yielded 609 bp long nucleotide sequences.A total of 42 sequences of 26 species were used in the molecular characterization and phylogenetic analysis where 33 sequences of 19 species were obtained from the present study and 9 sequences of 7 species were collected from previous studies.In phylogenetic analysis, the sequences discriminated all species clustering the same species under the same nodes with significant bootstrap values of 80%-100% (Figure6).The 16S rRNA sequences obtained from 26 species comprised 31 haplotypes with 241 polymorphic sites.The nucleotide diversity was calculated as 0.132 and the haplotype diversity was 0.984 ± 0.009 (mean ± SD).Parsimony informative sites of two, three, and four variants were 84, 54, and 41, respectively.The number of transition and transversion of studied 16S rRNA sequences were 303 and 164, respectively.The mean genetic distance (%) among all sequences of 16S rRNA was estimated as 15.30 ± 0.01(mean ± SD).The mean nucleotide base compositions were calculated as A = 28.63 ± 0.17% (mean ± SD), T = 22.81 ± 0.19% (mean ± SD), C = 25.47 ± 0.19% (mean ± SD), and G = 23.10 ± 0.12% (mean ± SD).The base composition analysis for the 16S rRNA sequences showed that the average C content was the highest and the average T content was the lowest.The mean GC content was 48.57%.A summary of genetic distances of different taxonomic levels viz., within species, within genera, and within families based on the Kimura two-parameter (K2P) distance model is given in Table4.Minimum genetic distances within species are 0.00% and the maximum is 6.63%, minimum genetic distance within the genus is 2.95% and the

F
Maximum and mean intraspecific divergence (% K2P) in the barcode region of COI plotted against nearest neighbor distance (% K2P) for examined species in this study.All comparisons had a barcode gap based on the positions of all points above the red line.TA B L E 2 Estimation of substitution matrix of COI sequences of maximum likelihood.
Codon composition of 179 COI barcodes for reef-associated fish of SMI.TA B L E 3 The distribution of sequence divergence at each taxonomic level of COI sequences.

F
I G U R E 7 Sequence divergence graph for all 16S rRNA sequences compared at the species and genus levels.F I G U R E 8 Maximum and mean intraspecific divergence (% K2P) in the barcode region of 16S rRNA plotted against nearest neighbor distance (% K2P) for the examined species in this study.All comparisons had a barcode gap based on the positions of all points above the red line.TA B L E 5 Estimation of substitution matrix of 16S rRNA sequences of maximum likelihood.
TA B L E 4